Kepler Scientific Workflow System
   HOME

TheInfoList



OR:

Kepler is a
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
system for designing, executing, reusing, evolving, archiving, and sharing scientific workflows.Ludäscher B., Altintas I., Berkley C., Higgins D., Jaeger-Frank E., Jones M., Lee E., Tao J., Zhao Y. 2006. Scientific Workflow Management and the Kepler System. Special Issue: Workflow in Grid Systems. Concurrency and Computation: Practice & Experience 18(10): 1039-1065.Altintas I, Berkley C, Jaeger E, Jones M, Ludäscher B, Mock S. 2004. Kepler: An Extensible System for Design and Execution of Scientific Workflows. Proceedings of the Future of Grid Data Environments, Global Grid Forum 10.Michener, William K., James H. Beach, Matthew B. Jones, Bertram Ludaescher, Deana D. Pennington, Ricardo S. Pereira, Arcot Rajasekar, and Mark Schildhauer. 2007. "A Knowledge Environment for the Biodiversity and Ecological Sciences", Journal of Intelligent Information Systems, 29(1): 111-126. Kepler's facilities provide process and data monitoring, provenance information, and high-speed data movement. Workflows in general, and scientific workflows in particular, are
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
s where the nodes represent discrete computational components, and the edges represent paths along which data and results can flow between components.Taylor, I.J.; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.), “Workflows for e-Science: Scientific Workflows for Grids”, 530 p., Springer. . In Kepler, the nodes are called 'Actors' and the edges are called 'channels'. Kepler includes a graphical user interface for composing workflows in a desktop environment, a runtime engine for executing workflows within the GUI and independently from a command-line, and a distributed computing option that allows workflow tasks to be distributed among compute nodes in a
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
or
computing grid Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
. The Kepler system principally targets the use of a workflow metaphor for organizing computational tasks that are directed towards particular scientific analysis and modeling goals. Thus, Kepler scientific workflows generally model the flow of data from one step to another in a series of computations that achieve some scientific goal.


Scientific workflow

A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions to a scientific problem.
Scientific workflow system A scientific workflow system is a specialized form of a workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application. Applications Distribute ...
s often provide graphical user interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of the scientists.


Access to scientific data

Kepler provides direct access to scientific data that has been archived in many of the commonly used data archives. For example, Kepler provides access to data stored in the Knowledge Network for Biocomplexity (KNB) Metacat serverJones, Matthew B., C. Berkley, J. Bojilova, M. Schildhauer. 2001. Managing Scientific Metadata. IEEE Internet Computing 5 (5): 59-68. and described using
Ecological Metadata Language Ecological Metadata Language (EML) is a metadata standard developed by and for the ecology discipline. It is based on prior work done by the Ecological Society of America and others, including the Knowledge Network for Biocomplexity. EML is a set of ...
. Additional data sources that are supported include data accessible using the DiGIR protocol, the
OPeNDAP OPeNDAP is an acronym for "Open-source Project for a Network Data Access Protocol," an endeavor focused on enhancing the retrieval of remote, structured data through a Web-based architecture and a discipline-neutral Data Access Protocol (DAP). Widel ...
protocol, GridFTP,
JDBC Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. I ...
, SRB, and others.


Models of Computation

Kepler differs from many of the other
bioinformatics workflow management systems A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...
in that it separates the structure of the workflow model from its model of computation, such that different models for the computation of the workflow can be bound to a given workflow graph. Kepler inherits several common models of computation from the Ptolemy system, including Synchronous Data Flow (SDF), Continuous Time (CT), Process Network (PN), and Dynamic Data Flow (DDF), among others.


Hierarchical workflows

Kepler supports hierarchy in workflows, which allows complex tasks to be composed of simpler components. This feature allows workflow authors to build re-usable, modular components that can be saved for use across many different workflows.


Workflow semantics

Kepler provides a model for the semantic annotation of workflow components using terms drawn from an
ontology In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories and which of these entities exis ...
. These annotations support many advanced features, including improved search capabilities, automated workflow validation, and improved workflow editing.Berkley, Chad, Shawn Bowers, Matthew B. Jones, Bertram Ludaescher, Mark Schildhauer, Jing Tao. 2005. Incorporating Semantics in Scientific Workflow Authoring. 17th International Conference on Scientific and Statistical Database Management. IEEE Computer Society.


Sharing workflows

Kepler components can be shared by exporting the workflow or component into a Kepler Archive (KAR) file, which is an extension of the JAR file format from Java. Once a KAR file is created, it can be emailed to colleagues, shared on web sites, or uploaded to the Kepler Component Repository. The Component Repository is centralized system for sharing Kepler workflows that is accessible via both a web portal and a web service interface. Users can directly search for and utilize components from the repository from within the Kepler workflow composition GUI.


Provenance

Provenance is a critical concept in scientific workflows, since it allows scientists to understand the origin of their results, to repeat their experiments, and to validate the processes that were used to derive data products. In order for a workflow to be reproduced, provenance information must be recorded that indicates where the data originated, how it was altered, and which components and what parameter settings were used. This will allow other scientists to re-conduct the experiment, confirming the results.http://www.adambarker.org/papers/ppam08.pdf Little support exists in current systems to allow end-users to query provenance information in scientifically meaningful ways, in particular when advanced workflow execution models go beyond simple DAGs (as in process networks).Shawn Bowers, Timothy McPhillips, Bertram Ludascher, Shirley Cohen, Susan B. Davidson 2006.
Model for User-Oriented Data Provenance in Pipelined Scientific Workflows.
/ref>


Kepler history

The Kepler Project was created in 2002 by members of the Science Environment for Ecological Knowledge (SEEK) project and the Scientific Data Management (SDM) project. The project was founded by researchers at the
National Center for Ecological Analysis and Synthesis The National Center for Ecological Analysis and Synthesis (NCEAS) is a research center at the University of California, Santa Barbara, in Santa Barbara, California, Santa Barbara, California. Better known by its acronym, NCEAS (pronounced “n-seas ...
(NCEAS) at the
University of California, Santa Barbara The University of California, Santa Barbara (UC Santa Barbara or UCSB) is a Public university, public Land-grant university, land-grant research university in Santa Barbara County, California, Santa Barbara, California with 23,196 undergraduate ...
and the
San Diego Supercomputer Center The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure. ...
at the
University of California, San Diego The University of California, San Diego (UC San Diego or colloquially, UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California. Established in 1960 near the pre-existing Scripps Insti ...
. Kepler extends Ptolemy II, which is a software system for modeling, simulation, and design of concurrent, real-time, embedded systems developed at UC Berkeley. Collaboration on Kepler quickly grew as members of various scientific disciplines realized the benefits of scientific workflows for analysis and modeling and began contributing to the system. As of 2008, Kepler collaborators come from many science disciplines, including ecology, molecular biology, genetics, physics, chemistry, conservation science, oceanography, hydrology, library science, computer science, and others. Kepler is a workflow orchestration engine which is used to make workflows for making work much easier, in the form of actor.


See also

*
Apache Taverna Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name ''Taverna Workbench'', then a project under the Apache incubator. Taverna allowed users to integrate many ...
*
Discovery Net Discovery Net is one of the earliest examples of a scientific workflow system allowing users to coordinate the execution of remote services based on Web service and Grid Services (OGSA and Open Grid Services Architecture) standards. The system was ...
*
VisTrails VisTrails is a scientific workflow management system developed at the Scientific Computing and Imaging Institute at the University of Utah that provides support for data exploration and visualization. It is written in Python and employs Qt via ...
*
LONI Pipeline The LONI Pipeline is a free distributed system for designing, executing, monitoring and sharing scientific workflowsRex, D. E., Ma, J.Q., and Toga, A.W. (2003). "The LONI Pipeline Processing Environment." Neuroimage, 19(3), 1033-48.Rex, D. E., S ...
*
Bioinformatics workflow management systems A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...
*
DataONE DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs ...
Investigator Toolkit


References


External links


Kepler Project website

Kepler Component Repository

Ptolemy II project website

Knowledge Network for Biocomplexity (KNB) Data archive

List of software tools
{Dead link, date=February 2020 , bot=InternetArchiveBot , fix-attempted=yes related to workflows on the
DataONE DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs ...
website Workflow applications Bioinformatics software Free and open-source software